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ABSTRACT 

Two plant genome size databases have been 
recently updated and/or extended: the Plant DNA 
C-values database (http://data.kew.org/cvalues), 
and GSAD, the Genome Size in Asteraceae 
database (http://www.asteraceaegenomesize.com). 
While the first provides information on nuclear DNA 
contents across land plants and some algal groups, 
the second is focused on one of the largest and 
most economically important angiosperm families, 
Asteraceae. Genome size data have numerous 
applications: they can be used in comparative 
studies on genome evolution, or as a tool to 
appraise the cost of whole-genome sequencing 
programs. The growing interest in genome size 
and increasing rate of data accumulation has 
necessitated the continued update of these data- 
bases. Currently, the Plant DNA C-values database 
(Release 6.0, Dec. 2012) contains data for 8510 
species, while GSAD has 1219 species (Release 
2.0, June 2013), representing increases of 17 and 
51%, respectively, in the number of species with 
genome size data, compared with previous 



releases. Here we provide overviews of the most 
recent releases of each database, and outline new 
features of GSAD. The latter include (i) a tool to 
visually compare genome size data between 
species, (ii) the option to export data and (iii) a 
webpage containing information about flow 
cytometry protocols. 

INTRODUCTION 

The total amount of DNA in the unreplicated haploid or 
gametic nucleus of an organism is referred to as the C- 
value or genome size (1), and across eukaryotes it varies 
approximately 66 000-fold (2). The smallest genome so far 
reported is found in the parasitic microsporidian 
Encephalitozoon intestinalis (3,4) with a C-value of just 
2.3 Mb [C-values are usually reported either in terms of 
mass (picograms, pg, with 1 pg = 10~ 12 g) or number of 
base pairs, with most estimates given in megabase pairs 
or gigabase pairs. N.B. 1 pg = 978 Mb (5).]. At the other 
end of the scale, the largest reliable genome size estimate is 
for the angiosperm Paris japonica with a C-value of 
148 880 Mb (2). Interest in this genomic character goes 
back to the late 1940s and early 1950s when researchers 
started to systematically measure and compare DNA 
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amounts within and between plants and animals (6-8). 
These early studies revealed that genome size was remark- 
ably constant within a species (8), and provided support 
for DNA rather than protein being the hereditary material 
[reviewed in (9)]. Since then interest has remained high as 
genome size has been shown to be a key biodiversity char- 
acter of fundamental biological and evolutionary 
significance (9-11). In addition, knowledge of genome 
size has practical implications, such as estimating the 
cost and time for whole genome sequencing projects 
(12), and selecting protocols for DNA fingerprinting 
studies (13,14). 

Despite this realization of the importance of genome 
size to both fundamental and applied research, for many 
years it was difficult to know whether a genome size meas- 
urement existed for a particular taxon and if so where to 
find it. This was because values were either scattered in the 
literature or unpublished. Nevertheless, this impediment 
has now been largely overcome by the release of electronic 
databases for several major groups of eukaryotes (15,16): 
animals (http://www.genomesize.com), fungi http://www. 
zbi.ee/fungal-genomesize) and plants (http://data.kew.org/ 
cvalues and http://www.asteraceaegenomesize.com). 
Together these databases currently contain data for 
> 15 000 species comprising 4972 animals, 1581 fungi and 
8922 plants. 

Interest in the field of genome size research remains high 
and new genome size data continue to be published in the 
literature. Thus, keeping the databases up to date has 
necessitated the continued release of new versions. This 
article focuses on the two open-access plant genome size 
databases, which have recently been updated: the Plant 
DNA C-values database (Release 6.0, December 2012, 
http://data.kew.org/cvalues) and the Genome size in 
Asteraceae database (GSAD; Release 2.0, June 2013, 
http : / / www. asteraceaegenomesize.com) . 

THE PLANT DNA C-VALUES DATABASE 

The Plant DNA C-values database (http://data.kew.org/ 
cvalues) was first launched in 2001 to provide a user- 
friendly searchable database where both published and 
unpublished values of plant genome size could be readily 
found (15,17). It contained data for 3864 species that had 
been compiled and published by Bennett and colleagues in 
hard copy between 1976 and 2000 (18-23). Since 2001, the 
increasing volume and rate of production of new data on 
plant genome sizes (Figure 1) has led to five further 
updates of the database, with the most recent release 
(Release 6.0, December 2012) containing data for 8510 
species compiled from 808 original reference sources. 
The majority (89%) of estimates are for angiosperms 
(7542 species from 695 references), with the others 
comprising 365 gymnosperms (from 48 references), 128 
pteridophytes (comprising monilophytes and lycophytes 
from 21 references), 232 bryophytes (from seven refer- 
ences) and 253 algae (from 37 references) (Figure 2). A 
detailed description of the organization, search options 
and output fields in the Plant DNA C-values database 
has already been given in (15) and is also available from 
the 'Help' web page of the database (http://data.kew.org/ 
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Figure 1. Mean number of plant genome size estimates reported per 
year over 12 successive 5-year periods and the 3-year period 2010-2012 
(dotted line), between 1950 and 2012. Data taken from the Plant DNA 
C-values database (Release 6.0, December 2012). 



cvalues/searchguide.html). This outlines the diverse and 
flexible search options available to enable the user to in- 
terrogate the database. For example, the user can choose 
to (i) search the whole database, or just a subset of it (e.g. 
just angiosperms), (ii) restrict searches to a specific range 
of DNA amounts, chromosome numbers and/or ploidy 
levels, (iii) restrict searches to a particular family, higher 
order plant group and (iv) conduct wild card searches. In 
addition, the various options available for displaying the 
results of the search are given, such as the choice to output 
the data as 1C, 2C or 4C values in Mb or pg, and to sort 
the results by DNA amount, chromosome number, ploidy 
level or taxonomically (e.g. alphabetically by family, 
genus, species). 

It is noted that the Plant DNA C-values database does 
not currently display information about which calibration 
standard has been used to estimate the genome size of a 
particular species, despite the realization that choice of 
standard and its assumed C-value are two of the major 
factors contributing to artifactual genome size variation, 
as outlined in Dolezel and Greilhuber (24) and Suda and 
Leitch (25). Clearly there is a need to deal with these im- 
portant issues and to reach a consensus on the selection of 
appropriate calibration standards and uniformity on the 
C-values assumed. However, as an interim measure, the 
option to display the standard species used will be 
included in the next release of the database. 

What is new in Release 6.0 of the Plant DNA C-values 
database 

Compared with the previous release of the Plant DNA C- 
values database (i.e. Release 5.0, December 2010 with data 
for 7058 species), the number of species included has 
increased by 17%. Analysis shows that 2010-2012 had 
the highest rate of plant genome size data generation 
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Figure 2. Growth of the Plant DNA C-values database in terms of the total number of species represented in the whole database (diagonal 
hatch) and for each individual group (squares = angiosperms; light gray = gymnosperms; white = pteridophytes; black = bryophytes; dark 
gray = algae). 



known (i.e. approximately 460 species not previously 
listed in the database ( = 'new' species) added per year, 
and >600 estimates in total per year (Figure 1). 

Angiosperms 

Most of the novel additions to the database have come 
from research in angiosperms, where data for 1255 species 
not previously listed have been added. Not only has this 
increased the percentage of angiosperm species with 
genome size data to approximately 2.1% [based on an 
estimate of 352000 angiosperm species in total, (26)], 
but representation at the generic and family levels has 
also improved. At the generic level, the new release 
includes estimates for 187 genera not previously listed 
and brings the number with at least one genome size 
estimate to 1635, corresponding to 12.6% of the 12 962 
genera recognized (27). The database also includes 
genome sizes for 249 families, although only nine 
families not previously represented in the database were 
added in Release 6.0. Of the 415 families currently 
recognized (28), 60% have at least one genome size 
estimate. 

Other land plant groups and algae 

In other land plant groups, the most notable progress in 
improving genome size representation has been in the 
gymnosperms where the number of 'new' species has 
increased by 43%. This is largely due to several recent 
surveys by Zonneveld (29-31), which together have 
generated data for all cycad genera and 64 of the 69 



conifer genera now recognized (32). Consequently, 
genome size data are now available for 35% of gymno- 
sperm species (355 out of the 1026 species recognized 
by 32), including representatives of all 12 gymnosperm 
families, and 98% of the genera (81 out of 83 genera 
recognized by 32). Gymnosperms are the best represented 
of all land plant groups in terms of genome size (Table 1). 

Progress in other land plant groups and algae remains 
poor, with the addition of only 46 pteridophyte species not 
previously included in the database and no new data for 
bryophyte or algal species. Nevertheless, this will be ad- 
dressed in Release 7.0 planned for 2014 as new genome 
size data for the bryophyte groups liverworts [67 species 
from 33 families, (33)] and hornworts [24 species from 5 
families, (34)] will be added, together with new data for 
algae [e.g. (35-38)] and other land plant groups collated 
from the literature. 

The Plant DNA C-values database provides insights into 
plant genome size diversity 

Overall, analysis of the data available in the Plant DNA 
C-values database illustrates the considerable diversity in 
genome sizes between the different land plant and algal 
groups, both in terms of the range of genome sizes 
encountered and the distribution of genome sizes 
(Figure 3, Table 1). Such different genome size profiles 
highlight the contrasting genome size dynamics operating 
between plant lineages (39,40) and argue strongly for the 
need to continue to collate and analyze genome sizes 
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Table 1. Minimum (Min.), maximum (Max.) and mean 2C-values for each plant group represented in the Plant DNA C-values database 
(Release 6.0, Dec. 2012), together with percentage representation of species in each group 



Plant group Min. Max. Mean Range Approximate Number of species in Approximate % species 
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Figure 3. Histograms showing the distribution of genome sizes in the different plant groups using data taken from the Plant DNA C-values database 
(Release 6.0, December 2012). 



across the plant tree of life to form a more holistic under- 
standing of plant genomic diversity. 

THE GSAD 

GSAD (http://www.asteraceaegenomesize.com) provides 
genome size data specifically for Asteraceae (Compositae), 
which are considered to be one of the largest plant 
families (24 000-30 000 species) with a worldwide distribu- 
tion, except Antarctica. Overall, Asteraceae account for 



approximately 7-9% of angiosperm species on Earth 
and include many economically important representatives 
such as those used for food (e.g. artichoke — Cynara 
cardunculus, sunflower — Helianthus annum), medicine 
(e.g. artemisinin, an active compound against malaria 
extracted from the sweet wormwood — Artemisia annua) 
and horticulture (e.g. Chrysanthemum and Dahlia species 
and hybrids), or which are invasive noxious weeds (e.g. 
Taraxacum). This family has been the target of numerous 
molecular systematic and genomic studies (e.g. 41^13) and 
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the focus of evolutionary-developmental research such as 
floral development in Gerbera or Helianthus (44). The sun- 
flower is also the subject of an ongoing whole genome 
sequencing project (45), with the current release containing 
data for >80% of the genome (45,46). 

Development of GSAD was initiated by research 
groups based at the Universitat de Barcelona and 
Institut Botanic de Barcelona (IBB-CSIC-ICUB) in col- 
laboration with a team from the Universite de Paris 
Sud-CNRS (http://www.etnobiofic.cat). It arose from 
their long-term scientific interest in Asteraceae, particu- 
larly from a genome size perspective (16,47-51). Given 
the large amounts of genome size data for Asteraceae 
generated by these and other research groups, the 
decision to develop and curate an online genome size 
database focused specifically on Asteraceae was taken. 
The aim was to complement the Plant DNA C-values 
database in the same way that the Index to 
Chromosome Numbers in Asteraceae (http://www.lib. 
kobe-u.ac.jp/infolib/meta_pub/G0000003asteraceae_e) 
complements the more general Index to Plant 
Chromosome Numbers (http://www.tropicos.org/Project/ 
IPCN). Additionally, GSAD provides data for hybrid 
taxa, varieties, forms and cultivars of Asteraceae, which 
are not usually included in the Plant DNA C-values 
database [e.g. see (10,17)]. 

GSAD was launched in July 2010 (Release 1.0) and a 
detailed description of its content and organization is 
given in Garnatje et al. (16). In the 3 years since the first 
release, data for a further 412 species have been collated 
from 40 publications (either already published or in press 
by June 2013) reflecting both the continued scientific 
interest in the field and the inclusion of previously over- 
looked articles (Figure 4). These new data have been 
incorporated into the new release (Release 2.0, June 
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Figure 4. Mean number of Asteraceae genome size estimates reported 
per year over 9 successive 5-year periods and the 4-year period 2010— 
2013 (dotted line), between 1965 and 2013. Data taken from GSAD 
(Release 2.0, June 2013). 



2013), which contains genome sizes for 1219 species 
[Currently GSAD contains C-value data for 
approximately 400 species of Asteraceae not listed in the 
Plant DNA C-values database (Release 6.0, December 
2012). Many of the additional species are from unpub- 
lished data and hence were not available for inclusion in 
the Plant DNA C-values database. In addition, GSAD 
includes some data that have been published or were in 
press in 2013 and hence were not included in the 2012 
release of the Plant DNA C-values database.], 186 
genera, 20 tribes and six subfamilies compiled from 133 
original references. Currently, GSAD is the only genome 
size database focused on a single plant family. 

Database content update 

Overall, the total number of species and genera listed 
in GSAD has grown by 51 and 72%, respectively. In 
addition, Release 2.0 now includes some well-known 
genera such as Leontopodium and Mutisia for which no 
previous records were available. Table 2 provides infor- 
mation on the percentage of species with genome size data 
for the 6 subfamilies and 20 tribes comprising Asteraceae, 
together with their minimum, maximum, mean and range 
of C-values. With respect to Release 1.0, the most studied 
genera from a genome size perspective are still the same 
(Table 3), although Hieracium has moved from third 
to second position. Given the increasing rate at which 
new genome size data are being generated (Figure 4), it 
is clear that interest in this key biodiversity character 
in Asteraceae remains high and indeed, seems likely to 
increase in the coming years. 

Web interface features new to Release 2.0 

Release 2.0 of GSAD includes several new features to 
enhance the user's experience. 

(i) A genome size representation tool is now included 
to enable the user to visually compare genome sizes 
for a set of species. This allows genome size differ- 
ences within a given search output to be easily 
compared. A bar, whose size is directly proportional 
to genome size, is shown next to the genome size 
value of the species, together with a red line repre- 
senting the mean value of the genus. 

(ii) Following the recommendations of Bateman on 
how to improve the usability of a database (54), 
another novel feature is the option to export data 
from a search to an Excel™ file, and/or email the 
results. 

New page tabs 

Several new page tabs have been created for the new 
version, (i) 'Publications' provides a complete list of 
source references, together with a link to the pdf of the 
article (accessible if the user/user's institution has permis- 
sion, otherwise only the abstract is shown) if available, as 
some publications only exist in hard copy, (ii) 'Protocols 
and Reagents' contains information on how to estimate 
plant genome size by flow cytometry and links to books on 
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Table 2. Minimum (Min.), maximum (Max.) and mean 2C-values for each of the subfamilies (in bold) and tribes of Asteraceae represented in 
GSAD (Release 2.0, June 2013) together with percentage representation of species in each group 
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"Number of species recognized in each subfamily and tribe taken from Kubitzki (52) and Funk el al. (53). 



Table 3. A comparison of the number of records for the most widely 
represented genera in the GSAD database 
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Release 2.0 


Increase (%) 
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135 (5) 


19.5 



The ranking of the best represented genera is in brackets. 



these topics. The aim of this new tab is to help and guide 
scientists on how to use flow cytometry to estimate plant 
genome size accurately, (hi) 'Help' has notes on simple 
and advanced search options, a table on the methods 
used to estimate genome size and an explanation on how 
the genome size representation tool works, (iv) 'News' is a 
blog with information on upcoming meetings, relevant 
articles and links related to genome size and Asteraceae. 
(v) 'Submit your Data' provides the option for researchers 
to send data through a submission form, (vi) 'What's 
new?' gives details of updates and improvements in each 
new release of the database. 

Updates to existing page tabs 

Some tabs in Release 1.0 have been updated. For example, 
the 'Home' tab has a shorter introduction but now 
includes graphs to illustrate data increments from the 



first to the second release in terms of total number of 
estimates, species, genera and references. In addition, 
the number of estimates determined by different measure- 
ment techniques (e.g. flow cytometry, Feulgen micro- 
densitometry) is given. 

On the 'How to cite?' page, there is now a link to the pdf 
of Garnatje et al. (16) outlining the first release of GSAD 
(accessible if the user/user's institution has permission). 
Finally, the 'Links' tab has been expanded to include 
links to other sites containing genome size data and 
related genomic information. 

Future prospects 

The second release of GSAD arose from a considerable 
compilation effort and has led to a significant increase in 
the number of Asteraceae species with genome size data. 
Given this remarkable growth of data in recent years, 
annual updates are planned so that readily accessible 
global knowledge on Asteraceae genome sizes remains 
up to date. Other improvements to GSAD in the near 
future are likely to include the incorporation of links to 
published molecular phylogenetic and sequence data for 
species listed in any given search output, together with 
data for closely related genera, if available. 

Despite the many species already listed, there are still 
conspicuous and important gaps in the knowledge of 
genome size in this large family. Species representation 
only stands at approximately 5%, and C-values are 
missing for most tribes (approximately 60%) and for 6 
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of the 12 recognized subfamilies. Nevertheless, the con- 
struction of this database has enabled such gaps to be 
highlighted and will hopefully encourage the development 
of working strategies to fill them. In this regard, the fol- 
lowing 5-year targets are proposed to improve representa- 
tion of genome sizes in Asteraceae: to estimate a further 
1200 species, 130 genera, 10 tribes and 6 subfamilies to 
raise taxonomic representation to approximately 10% of 
species, approximately 20% of genera, approximately 70% 
of tribes and 100% of subfamilies by 2018. 
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